Brief Overview

Column

In this session, we will use Black Friday Data in Kaggle to study how to make the following graphical displays.

Column

Graphical Displays

  • Categorical Displays
    • Bar Chart
    • Pie Chart
  • Quantitative Data
    • Histogram
    • Boxplot
    • Scatterplot
    • Line

Common Arguments

Here is a list of common arguments: - col: a vector of colors - main: title for the plot - xlim or ylim: limits for the x or y axis - xlab or ylab: a label for the x or y axis - font: font used for text, 1= plain, 2= bold, 3= italic, 4= bold italic - font.axis: font used for axis - cex.axis: font size for x and y axis - font.lab: font for x and y labels - cex.lba: font size for x and y labels

Brief Overview 2

Row

In this session, we will use Black Friday Data in Kaggle to study how to make the following graphical displays.

Row

Graphical Displays

  • Categorical Data
    • Bar Chart
    • Pie Chart
  • Quantitative Data
    • Histogram
    • Boxplot
    • Scatterplot
    • Line

Common Arguments

Here is a list of common arguments: - col: a vector of colors - main: title for the plot - xlim or ylim: limits for the x or y axis - xlab or ylab: a label for the x or y axis - font: font used for text, 1= plain, 2= bold, 3= italic, 4= bold italic - font.axis: font used for axis - cex.axis: font size for x and y axis - font.lab: font for x and y labels - cex.lba: font size for x and y labels

Data

Column

First 500 Observations

Column

Description

In order to understand the customer purchases behavior against carious products of different categories, the retail company “ABC Private Limited”, in United Kingdom, shared purchase summary of various customers for selected high volume products from last month. the data contains the following variables.

  • User_ID: User ID
  • Product_ID: Product ID
  • Gender: Sex of User
  • Age: Age in bins
  • Occupation: Occupation (Masked)
  • City_Category: Category of the City (A, B, C)
  • Stay_In_Current_City_Years: Number of years stay in current city
  • Marital_Status: Marital Status
  • Product_Category_1: Product Category (Masked)
  • Product_Category_2: Product may belong to other category also (Masked)
  • Product_Category_3: Product may belong to other category also (Masked)
  • Purchase: Purchase Amount
Rows: 550,068
Columns: 12
$ User_ID                    <dbl> 1000001, 1000001, 1000001, 1000001, 1000002…
$ Product_ID                 <chr> "P00069042", "P00248942", "P00087842", "P00…
$ Gender                     <chr> "F", "F", "F", "F", "M", "M", "M", "M", "M"…
$ Age                        <chr> "0-17", "0-17", "0-17", "0-17", "55+", "26-…
$ Occupation                 <dbl> 10, 10, 10, 10, 16, 15, 7, 7, 7, 20, 20, 20…
$ City_Category              <chr> "A", "A", "A", "A", "C", "A", "B", "B", "B"…
$ Stay_In_Current_City_Years <chr> "2", "2", "2", "2", "4+", "3", "2", "2", "2…
$ Marital_Status             <dbl> 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0…
$ Product_Category_1         <dbl> 3, 1, 12, 12, 8, 1, 1, 1, 1, 8, 5, 8, 8, 1,…
$ Product_Category_2         <dbl> NA, 6, NA, 14, NA, 2, 8, 15, 16, NA, 11, NA…
$ Product_Category_3         <dbl> NA, 14, NA, NA, NA, NA, 17, NA, NA, NA, NA,…
$ Purchase                   <dbl> 8370, 15200, 1422, 1057, 7969, 15227, 19215…

Bar Chart

Row

Bar Chart is a graphical display good for the general audience. Here, we study the distribution of Age Group of the compnay’s customers who purchased their products on Black Friday. Usage:barplot(height, …)

A bar chart can be horizontal or vertical. Using the argument col, we can assign a color for bars. The argument main could be used to change the title of the figure. We can use RGB color code to assign colors.

Note: The margin of a figure could be set using the c(bottom, left, top, right).

Analysis

Row

Vertical Bar Chart

Horizontal Bar Chart

Pie Chart

Column

Similarly, We can use pie chart to study the distribution of the city category.

Usage: pie(height,..)

Tip: Use color palette to chose colors (Google search: color scheme generator).

Distribution of City Category

Histogram

Column

Histogram is used when we want to study the distribution of a qunatitative variable. Here we study the distribution of customer purchase amount.

Usage: hist(x, …)

Column

Analysis

Boxplot

Column

Boxplot 1

Boxplot 2